From Paninian Sandhi to Finite State Calculus
نویسنده
چکیده
The most authoritative description of the morphophonemic rules that apply at word boundaries (external sandhi) in Sanskrit is by the great grammarian Pān. ini (fl. 5th c. B. C. E.). These rules are stated formally in Pān. ini’s grammar, the As. t .ādhyāyı̄ ‘group of eight chapters’. The present paper summarizes Pān. ini’s handling of sandhi, his notational conventions, and formal properties of his theory. An XML vocabulary for expressing Pān. ini’s morphophonemic rules is then introduced, in which his rules for sandhi have been expressed. Although Pān. ini’s notation potentially exceeds a finite state grammar in power, individual rules do not rewrite their own output, and thus they may be automatically translated into a rule cascade from which a finite state transducer can be compiled. 1 Sandhi in Sanskrit Sanskrit possesses a set of morphophonemic rules (both obligatory and optional) that apply at morpheme and word boundaries (the latter are also termed pada boundaries). The former are called internal sandhi (< sam. dhi ‘putting together’); the latter, external sandhi. This paper only considers external sandhi. Sandhi rules involve processes such as assimilation and vowel coalescence. Some examples of external sandhi are: na asti > nāsti ‘is not’, tat ca > tac ca ‘and this’, etat hi > etad dhi ‘for this’, devas api> devo ’pi ‘also a god’.1 2 Sandhi in Pān. ini’s grammar Pān. ini’s As. t .ādhyāyı̄ is a complete grammar of Sanskrit, covering phonology, morphology, syntax, semantics, and even pragmatics. It contains about 4000 rules (termed sūtra, literally ‘thread’), divided between eight ∗This work has been supported by NSF grant IIS0535207. Any opinions, findings, and conclusions or recommendations expressed are those of the author and do not necessarily reflect the views of the National Science Foundation. The paper has benefited from comments by Peter M. Scharf and by four anonymous referees. The symbol 〈’〉 (avagraha) does not represent a phoneme but is an orthographic convention to indicate the prodelision of an initial a-. chapters (termed adhyāya). Conciseness (lāghava) is a fundamental principle in Pān. ini’s formulation of carefully interrelated rules (Smith, 1992). Rules are either operational (i. e. they specify a particular linguistic operation, or kārya) or interpretive (i. e. they define the scope of operational rules).2 Rules may be either obligatory or optional. A brief review of some well-known aspects of Pān. ini’s grammar is in order. The operational rules relevant to sandhi specify that a substituend (sthānin) is replaced by a substituens (ādeśa) in a given context (Cardona, 1965b, 308). Rules are written using metalinguistic case conventions, so that the substituend is marked as genitive, the substituens as nominative, the left context as ablative (tasmāt), and the right context as locative (tasmin). For instance: 8.4.62 jhayo ho ’nyatarasyām jhaY-ABL h-GEN optionally This rule specifies that (optionally) a homogenous sound replaces h when preceded by a sound termed jhaY — i. e. an oral stop (Sharma, 2003, 783–784). Pān. ini uses abbreviatory labels (termed pratyāhāra) to describe phonological classes. These labels are interpreted in the context of an ancillary text of the As. t .ādhyāyı̄, the Śivasūtras, which enumerate a catalog of sounds (varn. asamāmnāya) in fourteen classes (Cardona, 1969, 6):
منابع مشابه
Computational Algorithms Based on the Paninian System to Process Euphonic Conjunctions for Word Searches
Searching for words in Sanskrit E-text is a problem that is accompanied by complexities introduced by features of Sanskrit such as euphonic conjunctions or ‘sandhis’. A word could occur in an E-text in a transformed form owing to the operation of rules of sandhi. Simple word search would not yield these transformed forms of the word. Further, there is no search engine in the literature that can...
متن کاملAutomatic Sanskrit Segmentizer Using Finite State Transducers
In this paper, we propose a novel method for automatic segmentation of a Sanskrit string into different words. The input for our segmentizer is a Sanskrit string either encoded as a Unicode string or as a Roman transliterated string and the output is a set of possible splits with weights associated with each of them. We followed two different approaches to segment a Sanskrit text using sandhi1 ...
متن کاملProceedings of First International Symposium on Sanskrit Computational Linguistics
The most authoritative description of the morphophonemic rules that apply at word boundaries (external sandhi) in Sanskrit is by the great grammarian Pān. ini (fl. 5th c. B. C. E.). These rules are stated formally in Pān. ini’s grammar, the As. t .ādhyāyı̄ ‘group of eight chapters’. The present paper summarizes Pān. ini’s handling of sandhi, his notational conventions, and formal properties of h...
متن کاملRelative Clauses In Hindi And Arabic: A Paninian Dependency Grammar Analysis
We present a comparative analysis of relative clauses in Hindi and Arabic in the tradition of the Paninian Grammar Framework (Bharati et al., 1996b) which leads to deriving a common logical form for equivalent sentences. Parallels are drawn between the Hindi co-relative construction and resumptive pronouns in Arabic. The analysis arises from the development of lexicalised dependency grammars fo...
متن کاملAn Annotation Scheme for English Language using Paninian Framework
This paper presents a comprehensive study about the Panini’s karaka relations for English. Paninian framework is suitable to all Indian language but some issues occur when applied to English languages. This paper discuss what are these issues and different approaches that were used in past.
متن کامل